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ABSTRACT 


One  of  the  early  applications  of  the  fast  Fourier  trans¬ 
form  was  in  spectral  analysis  of  speech  waveforms, 
and  it  was  immediately  evident  that,  at  least  in  principle, 
speech  spectrograms  could  be  generated  by  computer 
using  the  FFT.  In  this  note  the  procedure  and  results 
of  such  an  experiment  are  presented. 
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COMPUTER  GENERATED  SPEECH  SPECTROGRAMS 


One  of  the  early  applications  of  the  fast  Fourier  transform  (FFT)  algorithm 

was  in  spectral  analysis  of  speech  waveforms  for  bandwidth  compression,  automatic 

speech  and  speaker  recognition,  speech  displays,  etc.  It  was  immediately  evident, 
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and  suggested  both  formally  and  informally,  ’  that,  at  least  in  principle  speech 
spectrograms  could  be  generated  on  a  computer  using  the  FFT.  The  purpose  of  this 
note  is  to  present  the  procedure  and  results  for  one  such  attempt. 

The  computational  procedure  consists  of  first  determining  the  Fourier  transform 
of  the  speech  to  be  analyzed  by  weighting  the  speech  with  a  Hanning  window,  then 
transforming  using  the  FFT  and  computing  the  spectral  magnitude.  The  number  of 
points  transformed,  and  the  time  increments  between  spectral  cross-sections,  depend 
in  each  case  on  the  tradeoff  desired  between  time  and  spectral  resolution.  An  attempt 
was  made  to  keep  the  analysis  time  approximately  the  same  for  wide  and  narrow  band 
spectrograms.  As  a  consequence,  when  the  window  duration,  and  consequently  the 
size  of  the  Fourier  transform,  is  doubled,  thereby  increasing  spectral  resolution,  the 
time  increment  between  spectral  cross-sections  is  doubled. 

The  spectrograms  are  displayed  on  a  conventional  Hewlett-Packard  1200  AR 
oscilloscope.  The  intensity  to  be  displayed  is  used  to  modulate  the  duration  of  an  un¬ 
blanking  pulse  applied  to  the  z-axis.  The  resulting  hard  copy  is  obtained  from  a  time 
exposure,  using  Polaroid  type  52  film.  In  the  present  system,  the  spectrograms  are 
displayed  on  a  grid  with  256  points  on  the  vertical  (frequency)  axis  and  512  points  on 
the  horizontal  (time)  axis.  Values  on  this  grid  are  obtained  from  the  computed  values, 
which  lie  on  a  less  dense  grid,  by  linear  interpolation. 

The  contrast  and  high  frequency  shaping  desired  in  spectrogram  displays  is 
generally  dependent  on  the  specific  application  for  which  they  are  to  be  used.  Conse¬ 
quently,  a  provision  for  varying  both  has  been  incorporated  in  the  present  system. 
Variable  contrast  is  incorporated  by  applying  a  variable  exponent  to  the  spectral 
magnitude  prior  to  D-A  conversion.  Frequency  shaping  is  then  applied  by  multiplying 
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this  modified  spectral  magnitude  by  a  frequency  dependent  gain.  The  overall  system 
is  illustrated  in  Fig.  1.  In  Fig.  2  and  Fig.  3  are  displayed  some  typical  spectrograms 
obtained  in  this  way,  together  with  the  corresponding  voiceprint  spectrograms  for 
reference.  The  sentence  is  "He  took  a  walk  every  morning.  "  as  spoken  by  a  male 
speaker.  The  input  speech  has  been  pre-emphasized  at  6  db/ octave  starting  at 
350  Hz  and  sampled  at  a  10  kHz  rate.  Figure  2a  shows  a  narrowband  voiceprint 
spectrogram  of  the  pre-emphasized  speech  and  with  high  frequency  shaping.  Figures 
2b,  c,  d  correspond  to  a  narrowband  spectrogram,  computed  using  a  window  length 
(and  Fourier  transform  size)  of  512  points.  A  new  spectral  cross-section  is  computed 
each  9.  6  msec.  In  Fig.  2b,  the  parameter  y  as  defined  in  Fig.  1  is  unity  and  no 
frequency  shaping  has  been  applied.  Figure  2c  corresponds  to  no  frequency  shaping 
but  a  y  of  (2/3).  Figure  2d  represents  a  y  of  unity  and  linear  shaping  starting  at 
1.  25  kHz  with  a  slope  of  1.  6  (1/kHz).  Figure  3a  shows  a  wideband  voiceprint  spectro¬ 
gram  of  the  pre-emphasized  speech  and  with  high  frequency  shaping.  Figures  3b,  c, 
d  corresponds  to  a  wideband  spectrogram,  computed  using  a  Fourier  transform  size 
of  128  points.  A  new  spectral  cross-section  is  computed  each  2.4  msec.  In  Figures 
3b  to  3d,  the  contrast  and  frequency  shaping  are  the  same  as  in  Figs.  2b  to  2d  respec¬ 
tively.  Figure  3e  is  identical  to  Fig.  3d  except  for  an  expanded  time  scale. 

One  of  the  potential  advantages  of  computer  generated  spectrograms  is  the 
tremendous  flexibility  available.  Bandwidths,  for  example,  can  be  chosen  to  more 
nearly  match  the  kind  of  speech  being  analyzed  and  in  fact  can  be  time  variable,  con¬ 
trolled  for  example  by  the  spectral  derivative  or  the  fundamental  frequency.  Further¬ 
more,  the  spectrogram  display  can  be  more  precisely  correlated  with  other  speech 
displays  such  as  the  waveform  and  spectral  cross-section.  It  seems  reasonable  to 
anticipate  either  special  purpose  digital  hardware  or  computer  attachments  that  can 
provide  both  hard  copy  and  on-line  spectrographic  displays  of  speech,  with  a  quality 
comparable  to  or  perhaps  better  than  other  presently  available  spectrographic  displays. 
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Fig.  1.  Block  diagram  for  obtaining  and  displaying  spectrograms. 


Fig.  2.  Narrowband  spectrograms. 
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Fig.  2,  Continued, 
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Fig.  3.  Wideband  spectrograms. 
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Fig.  3.  Continued. 
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Fig.  3.  Continued. 
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